My ML Project

Authors
Affiliation

Name I, First Name I

Name of the University

Name II, First Name II

Published

May 1, 2024

Abstract

The following machine learning project focuses on…

[1] "C:/Users/UrsHu/Pillars/Learn/Academic/Master/Semestre 2/machine learning/Project/Machine_Learning"

1 Introduction

  • Overview and Motivation
  • Related Work
  • Research questions

2 TESTING if R works and if Python works

#> [1] "hello"
#> 30.0

3 Data

  • Sources
  • Description
  • Wrangling/cleaning
  • Spotting mistakes and missing data (could be part of EDA too)
  • Listing anomalies and outliers (could be part of EDA too)

3.1 Main dataset Cleaning

#> [1] "C:/Users/UrsHu/Pillars/Learn/Academic/Master/Semestre 2/machine learning/Project/Machine_Learning/docs"
#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category
#> 1    eg        0-1919
#> 2    eg        0-1919
#> 3    eg        0-1919
#> 4    eg        0-1919
#> 5    eg        0-1919
#> 6    eg        0-1919

3.2 Creating Variable zip_code and merging with AMTOVZ_CSV_LV95

#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category
#> 1    eg        0-1919
#> 2    eg        0-1919
#> 3    eg        0-1919
#> 4    eg        0-1919
#> 5    eg        0-1919
#> 6    eg        0-1919
#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category zip_code
#> 1    eg        0-1919     1844
#> 2    eg        0-1919     1820
#> 3    eg        0-1919     1882
#> 4    eg        0-1919     1820
#> 5    eg        0-1919     1815
#> 6    eg        0-1919     1260
#>       Ortschaftsname  PLZ Zusatzziffer       Gemeindename BFS.Nr
#> 1    Aeugst am Albis 8914            0    Aeugst am Albis      1
#> 2        Aeugstertal 8914            2    Aeugst am Albis      1
#> 3          Zwillikon 8909            0 Affoltern am Albis      2
#> 4 Affoltern am Albis 8910            0 Affoltern am Albis      2
#> 5         Bonstetten 8906            0         Bonstetten      3
#> 6          Sihlbrugg 6340            4    Hausen am Albis      4
#>   Kantonskürzel       E       N Sprache   Validity
#> 1            ZH 2679403 1235842      de 2008-07-01
#> 2            ZH 2679815 1237404      de 2008-07-01
#> 3            ZH 2675280 1238108      de 2008-07-01
#> 4            ZH 2676852 1236930      de 2008-07-01
#> 5            ZH 2677412 1241078      de 2008-07-01
#> 6            ZH 2686082 1230649      de 2008-07-01
#>                 City zip_code Canton_code
#> 1    Aeugst am Albis     8914          ZH
#> 2        Aeugstertal     8914          ZH
#> 3          Zwillikon     8909          ZH
#> 4 Affoltern am Albis     8910          ZH
#> 5         Bonstetten     8906          ZH
#> 6          Sihlbrugg     6340          ZH
#>       zip_code    price number_of_rooms
#> 1           25  2200000              10
#> 2           25  2200000              65
#> 3           26  1995000              75
#> 4           26   870490              45
#> 5          322   870000              25
#> 6          322  1295770              45
#> 2253      1200  2450000               6
#> 2254      1200   982130              45
#> 11886     1919  2535730              55
#> 11887     1919   230000              15
#> 11888     1919  1415380              35
#> 11889     1919  1043260              45
#> 11890     1919  2535730              55
#> 17993     2500  1050000              45
#> 17994     2500  1100000               5
#> 17995     2500   887500              55
#> 17996     2500   870500              45
#> 17997     2500  1176820              45
#> 17998     2500  1159550              35
#> 17999     2500  1927050              45
#> 18000     2500   892500              45
#> 18001     2500   887500              45
#> 18002     2500   420000              45
#> 18003     2500   877500              45
#> 18004     2500   885500              55
#> 18005     2500   872500              45
#> 19603     3000  1448610              45
#> 19604     3000  1515060              45
#> 19605     3000   956880              45
#> 19606     3000  1222680              35
#> 19607     3000  1448610              45
#> 19608     3000  1448610              45
#> 19609     3000  1515060              45
#> 19610     3000   820000              55
#> 19611     3000  1222680              35
#> 19612     3000  1590000              55
#> 19613     3000  1448610              45
#> 27169     4000  2100000              65
#> 27170     4000   975000              45
#> 30708     5201   963520              45
#> 33490     6511   584760               3
#> 33927     6547 19935000              55
#> 35207     6602   270000              15
#> 35208     6602  3721200              55
#> 35209     6602  3721200              55
#> 35210     6604  2644710              35
#> 35211     6604  2644710              35
#> 35212     6604  1142940              45
#> 35213     6604   610000              25
#> 35214     6604   810690              35
#> 35215     6604   860000              35
#> 35216     6604   917010              45
#> 35217     6604  1010040              45
#> 40817     6901  3628170              45
#> 40818     6911   877140              55
#> 40819     6911   810690              45
#> 40820     6911   730950              45
#> 40821     6911   465150              35
#> 42848     7133  2246010              35
#> 42861     7135  3575010              65
#> 43231     8000  1295770              35
#> 43232     8000  2100000              45
#> 43233     8000  2495000              55
#> 44144     8238   739000              35
#> 44145     8238   739000              35
#> 44146     8238   716000              35
#> 44147     8238   716000              35
#> 44148     8238   325600               3
#> 44889     8423  2910510              45
#> 44890     8423  2804190              55
#> 47001     9002  3787650              45
#> 47621     9241   724300              35
#>                                                  address
#> 1                                       1000 Lausanne 25
#> 2                                       1000 Lausanne 25
#> 3                          Lausanne 26, 1000 Lausanne 26
#> 4                                       1000 Lausanne 26
#> 5                    Via Cuolm Liung 30d, 7032 Laax GR 2
#> 6                       Via Murschetg 29, 7032 Laax GR 2
#> 2253                                         1200 Genève
#> 2254  Chemin des pralets, 74100 Etrembières, 1200 Genève
#> 11886                                      1919 Martigny
#> 11887                                      1919 Martigny
#> 11888                                      1919 Martigny
#> 11889                                      1919 Martigny
#> 11890                                      1919 Martigny
#> 17993                    Hohlenweg 11b, 2500 Biel/Bienne
#> 17994                                   2500 Biel/Bienne
#> 17995                                   2500 Biel/Bienne
#> 17996                                   2500 Biel/Bienne
#> 17997                                   2500 Biel/Bienne
#> 17998                                   2500 Biel/Bienne
#> 17999                                        2500 Bienne
#> 18000                                   2500 Biel/Bienne
#> 18001                                   2500 Biel/Bienne
#> 18002                                   2500 Biel/Bienne
#> 18003                                   2500 Biel/Bienne
#> 18004                                   2500 Biel/Bienne
#> 18005                                   2500 Biel/Bienne
#> 19603                                          3000 Bern
#> 19604                                          3000 Bern
#> 19605                                          3000 Bern
#> 19606                                          3000 Bern
#> 19607                                          3000 Bern
#> 19608                                          3000 Bern
#> 19609                                          3000 Bern
#> 19610                                          3000 Bern
#> 19611                                          3000 Bern
#> 19612                                          3000 Bern
#> 19613                                          3000 Bern
#> 27169                                         4000 Basel
#> 27170                                         4000 Basel
#> 30708                                      5201 Brugg AG
#> 33490                                     6511 Cadenazzo
#> 33927                               Augio 1F, 6547 Augio
#> 35207                                       6602 Muralto
#> 35208                                       6602 Muralto
#> 35209                                       6602 Muralto
#> 35210                                       6604 Solduno
#> 35211                                       6604 Solduno
#> 35212                                       6604 Solduno
#> 35213                                       6604 Solduno
#> 35214                                       6604 Solduno
#> 35215                                       6604 Solduno
#> 35216                                       6604 Locarno
#> 35217                                       6604 Locarno
#> 40817                                        6901 Lugano
#> 40818                             6911 Campione d'Italia
#> 40819                             6911 Campione d'Italia
#> 40820                             6911 Campione d'Italia
#> 40821                             6911 Campione d'Italia
#> 42848                  Inder Platenga 34, 7133 Obersaxen
#> 42861                                       7135 Fideris
#> 43231                                        8000 Zürich
#> 43232                                        8000 Zürich
#> 43233                                        8000 Zürich
#> 44144                         8238 Büsingen am Hochrhein
#> 44145                         8238 Büsingen am Hochrhein
#> 44146       Junkerstrasse 85, 8238 Büsingen am Hochrhein
#> 44147       Junkerstrasse 85, 8238 Büsingen am Hochrhein
#> 44148      Stemmerstrasse 14, 8238 Büsingen am Hochrhein
#> 44889                      Chüngstrasse 48, 8423 Embrach
#> 44890                      Chüngstrasse 60, 8423 Embrach
#> 47001                     6900 Lugano 2 Paradiso Caselle
#> 47621                                       9241 Kradolf
#>             canton    property_type floor year_category City
#> 1             Vaud     Single house           1919-1945 <NA>
#> 2             Vaud            Villa           2006-2010 <NA>
#> 3             Vaud            Villa           1961-1970 <NA>
#> 4             Vaud        Apartment noteg     2016-2024 <NA>
#> 5          Grisons        Apartment    eg     2016-2024 <NA>
#> 6          Grisons        Apartment noteg     2011-2015 <NA>
#> 2253        Geneva Bifamiliar house           1981-1990 <NA>
#> 2254        Geneva Bifamiliar house           2016-2024 <NA>
#> 11886       Valais       Attic flat noteg     2016-2024 <NA>
#> 11887       Valais        Apartment    eg     2016-2024 <NA>
#> 11888       Valais        Apartment noteg     2016-2024 <NA>
#> 11889       Valais        Apartment noteg     2016-2024 <NA>
#> 11890       Valais        Apartment noteg     2016-2024 <NA>
#> 17993         Bern     Single house           2001-2005 <NA>
#> 17994         Bern     Single house           2001-2005 <NA>
#> 17995         Bern     Single house           2016-2024 <NA>
#> 17996         Bern     Single house           2016-2024 <NA>
#> 17997         Bern            Villa           2016-2024 <NA>
#> 17998         Bern            Villa           2016-2024 <NA>
#> 17999         Bern     Single house           2016-2024 <NA>
#> 18000         Bern     Single house           2016-2024 <NA>
#> 18001         Bern     Single house           2016-2024 <NA>
#> 18002         Bern        Apartment noteg     1971-1980 <NA>
#> 18003         Bern     Single house           2016-2024 <NA>
#> 18004         Bern     Single house           2016-2024 <NA>
#> 18005         Bern     Single house           2016-2024 <NA>
#> 19603         Bern        Apartment    eg     2016-2024 <NA>
#> 19604         Bern        Apartment    eg     2016-2024 <NA>
#> 19605         Bern        Apartment    eg     2016-2024 <NA>
#> 19606         Bern        Apartment noteg     2016-2024 <NA>
#> 19607         Bern        Apartment noteg     2016-2024 <NA>
#> 19608         Bern        Apartment    eg     2016-2024 <NA>
#> 19609         Bern        Apartment    eg     2016-2024 <NA>
#> 19610         Bern        Apartment noteg     2016-2024 <NA>
#> 19611         Bern           Duplex noteg     2016-2024 <NA>
#> 19612         Bern        Apartment noteg     1991-2000 <NA>
#> 19613         Bern        Roof flat noteg     2016-2024 <NA>
#> 27169  Basel-Stadt            Villa           2016-2024 <NA>
#> 27170  Basel-Stadt     Single house           2016-2024 <NA>
#> 30708       Aargau        Apartment noteg     2016-2024 <NA>
#> 33490       Ticino        Apartment noteg     2016-2024 <NA>
#> 33927      Grisons     Single house           2016-2024 <NA>
#> 35207       Ticino        Apartment    eg     1961-1970 <NA>
#> 35208       Ticino     Single house           1981-1990 <NA>
#> 35209       Ticino     Single house           1981-1990 <NA>
#> 35210       Ticino       Attic flat noteg     2011-2015 <NA>
#> 35211       Ticino        Apartment noteg     2011-2015 <NA>
#> 35212       Ticino        Apartment noteg     2016-2024 <NA>
#> 35213       Ticino        Apartment noteg     2016-2024 <NA>
#> 35214       Ticino        Apartment noteg     2016-2024 <NA>
#> 35215       Ticino        Apartment noteg     2016-2024 <NA>
#> 35216       Ticino        Apartment noteg     2011-2015 <NA>
#> 35217       Ticino        Apartment noteg     2011-2015 <NA>
#> 40817       Ticino       Attic flat noteg     2011-2015 <NA>
#> 40818       Ticino     Single house           1971-1980 <NA>
#> 40819       Ticino        Apartment    eg     1946-1960 <NA>
#> 40820       Ticino        Apartment noteg     1991-2000 <NA>
#> 40821       Ticino        Apartment noteg     1946-1960 <NA>
#> 42848      Grisons     Single house           2006-2010 <NA>
#> 42861      Grisons     Single house              0-1919 <NA>
#> 43231       Zurich     Single house           2016-2024 <NA>
#> 43232       Zurich        Apartment noteg     2016-2024 <NA>
#> 43233       Zurich        Apartment noteg        0-1919 <NA>
#> 44144 Schaffhausen        Apartment    eg     2016-2024 <NA>
#> 44145 Schaffhausen       Attic flat    eg     2016-2024 <NA>
#> 44146 Schaffhausen       Attic flat noteg     2016-2024 <NA>
#> 44147 Schaffhausen        Apartment noteg     2016-2024 <NA>
#> 44148 Schaffhausen        Apartment noteg     1961-1970 <NA>
#> 44889       Zurich     Single house           2016-2024 <NA>
#> 44890       Zurich Bifamiliar house           2016-2024 <NA>
#> 47001       Ticino        Apartment noteg     2011-2015 <NA>
#> 47621      Thurgau        Apartment noteg     1991-2000 <NA>
#>       Canton_code
#> 1            <NA>
#> 2            <NA>
#> 3            <NA>
#> 4            <NA>
#> 5            <NA>
#> 6            <NA>
#> 2253         <NA>
#> 2254         <NA>
#> 11886        <NA>
#> 11887        <NA>
#> 11888        <NA>
#> 11889        <NA>
#> 11890        <NA>
#> 17993        <NA>
#> 17994        <NA>
#> 17995        <NA>
#> 17996        <NA>
#> 17997        <NA>
#> 17998        <NA>
#> 17999        <NA>
#> 18000        <NA>
#> 18001        <NA>
#> 18002        <NA>
#> 18003        <NA>
#> 18004        <NA>
#> 18005        <NA>
#> 19603        <NA>
#> 19604        <NA>
#> 19605        <NA>
#> 19606        <NA>
#> 19607        <NA>
#> 19608        <NA>
#> 19609        <NA>
#> 19610        <NA>
#> 19611        <NA>
#> 19612        <NA>
#> 19613        <NA>
#> 27169        <NA>
#> 27170        <NA>
#> 30708        <NA>
#> 33490        <NA>
#> 33927        <NA>
#> 35207        <NA>
#> 35208        <NA>
#> 35209        <NA>
#> 35210        <NA>
#> 35211        <NA>
#> 35212        <NA>
#> 35213        <NA>
#> 35214        <NA>
#> 35215        <NA>
#> 35216        <NA>
#> 35217        <NA>
#> 40817        <NA>
#> 40818        <NA>
#> 40819        <NA>
#> 40820        <NA>
#> 40821        <NA>
#> 42848        <NA>
#> 42861        <NA>
#> 43231        <NA>
#> 43232        <NA>
#> 43233        <NA>
#> 44144        <NA>
#> 44145        <NA>
#> 44146        <NA>
#> 44147        <NA>
#> 44148        <NA>
#> 44889        <NA>
#> 44890        <NA>
#> 47001        <NA>
#> 47621        <NA>

We have 144 NAN, where

  • The zip code was not found in the atmo df
  • The zip code was incorectly isolated from the address

Removed them ::: {.cell layout-align=“center”}

:::

3.3 Tax data cleaning

3.3.1 Merging the two datasets

##Dataset used for the rest of the analysis ::: {.cell layout-align=“center”}

:::

3.4 Cleaning of commune data

Replaces NAs in both Taux de couverture social and Political (Conseil National Datas) For Taux de couverture Social: NAs were due to reason “Q” = “Not indicated to protect confidentiality” We replaced the NAs by the average taux de couverture in Switzerland in 2019, which was 3.2%

For Political data: NAs were due to reason “M” = “Not indicated because data was not important or applicable” Therefore, we replaced the NAs by 0

4 EDA

4.1 Change the path below

4.2 Histogram of prices

4.3 Histogram of prices for each property type

note : only price between 0 and 500000 so some outliers aren’t here

4.4 Histogram of prices for each year category

note : only price between 0 and 500000 so some outliers aren’t here

4.5 Histogram of prices for each canton

note : only price between 0 and 500000 so some outliers aren’t here

4.6 Histogram of prices for each number of rooms

note : only price between 0 and 500000 so some outliers aren’t here

and the graph below only show apartments with less than 10 rooms (but you can change the code if needed

4.7 Histogram of prices with impot

4.8 Test Regression

#> 
#> Call:
#> lm(formula = price ~ number_of_rooms + canton + property_type + 
#>     year_category, data = properties)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -7013788  -514438  -138948   264464 21628996 
#> 
#> Coefficients:
#>                               Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                    -677158      55739  -12.15  < 2e-16
#> number_of_rooms                 337946       6166   54.81  < 2e-16
#> cantonappenzell-ausser-rhoden  -464945     126861   -3.66  0.00025
#> cantonappenzell-inner-rhoden   -874289     392590   -2.23  0.02596
#> cantonbasel-landschaft         -195701      57943   -3.38  0.00073
#> cantonbasel-stadt               218682     105130    2.08  0.03753
#> cantonbern                     -478376      46221  -10.35  < 2e-16
#> cantonfribourg                 -781416      48366  -16.16  < 2e-16
#> cantongeneva                   2025260      62234   32.54  < 2e-16
#> cantonglarus                   -573694     173301   -3.31  0.00093
#> cantongrisons                    59982      71666    0.84  0.40262
#> cantonjura                     -801519      77323  -10.37  < 2e-16
#> cantonlucerne                  -187979      73261   -2.57  0.01030
#> cantonneuchatel                -353635      65590   -5.39  7.1e-08
#> cantonnidwalden                 991055     244826    4.05  5.2e-05
#> cantonobwalden                  366062     244712    1.50  0.13470
#> cantonschaffhausen             -584997     120601   -4.85  1.2e-06
#> cantonschwyz                     18070     132558    0.14  0.89157
#> cantonsolothurn                -784557      61024  -12.86  < 2e-16
#> cantonst-gallen                -404890      55918   -7.24  4.6e-13
#> cantonthurgau                   -37337      63444   -0.59  0.55620
#> cantonticino                    125913      38499    3.27  0.00108
#> cantonuri                         9578     155772    0.06  0.95097
#> cantonvalais                   -219964      39781   -5.53  3.3e-08
#> cantonvaud                       89914      40258    2.23  0.02553
#> cantonzug                       801241     153896    5.21  1.9e-07
#> cantonzurich                    316099      49688    6.36  2.0e-10
#> property_typeAttic flat         311019      45964    6.77  1.4e-11
#> property_typeBifamiliar house    41841      42939    0.97  0.32986
#> property_typeChalet            1136804      56690   20.05  < 2e-16
#> property_typeDuplex              -5091      56699   -0.09  0.92846
#> property_typeFarm house         237939     118848    2.00  0.04529
#> property_typeLoft               285442     291977    0.98  0.32827
#> property_typeRoof flat            4801      64587    0.07  0.94074
#> property_typeRustic house      -281265     249068   -1.13  0.25880
#> property_typeSingle house       389066      24252   16.04  < 2e-16
#> property_typeTerrace flat        88662      87071    1.02  0.30856
#> property_typeVilla             1278283      38187   33.47  < 2e-16
#> year_category1919-1945           10462      61602    0.17  0.86515
#> year_category1946-1960           76025      57261    1.33  0.18429
#> year_category1961-1970          232055      48444    4.79  1.7e-06
#> year_category1971-1980          210609      43422    4.85  1.2e-06
#> year_category1981-1990          237789      43679    5.44  5.3e-08
#> year_category1991-2000          477554      45385   10.52  < 2e-16
#> year_category2001-2005          519338      55369    9.38  < 2e-16
#> year_category2006-2010          591351      48030   12.31  < 2e-16
#> year_category2011-2015          724194      47219   15.34  < 2e-16
#> year_category2016-2024          641233      36926   17.37  < 2e-16
#>                                  
#> (Intercept)                   ***
#> number_of_rooms               ***
#> cantonappenzell-ausser-rhoden ***
#> cantonappenzell-inner-rhoden  *  
#> cantonbasel-landschaft        ***
#> cantonbasel-stadt             *  
#> cantonbern                    ***
#> cantonfribourg                ***
#> cantongeneva                  ***
#> cantonglarus                  ***
#> cantongrisons                    
#> cantonjura                    ***
#> cantonlucerne                 *  
#> cantonneuchatel               ***
#> cantonnidwalden               ***
#> cantonobwalden                   
#> cantonschaffhausen            ***
#> cantonschwyz                     
#> cantonsolothurn               ***
#> cantonst-gallen               ***
#> cantonthurgau                    
#> cantonticino                  ** 
#> cantonuri                        
#> cantonvalais                  ***
#> cantonvaud                    *  
#> cantonzug                     ***
#> cantonzurich                  ***
#> property_typeAttic flat       ***
#> property_typeBifamiliar house    
#> property_typeChalet           ***
#> property_typeDuplex              
#> property_typeFarm house       *  
#> property_typeLoft                
#> property_typeRoof flat           
#> property_typeRustic house        
#> property_typeSingle house     ***
#> property_typeTerrace flat        
#> property_typeVilla            ***
#> year_category1919-1945           
#> year_category1946-1960           
#> year_category1961-1970        ***
#> year_category1971-1980        ***
#> year_category1981-1990        ***
#> year_category1991-2000        ***
#> year_category2001-2005        ***
#> year_category2006-2010        ***
#> year_category2011-2015        ***
#> year_category2016-2024        ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1240000 on 21363 degrees of freedom
#>   (72 observations deleted due to missingness)
#> Multiple R-squared:  0.323,  Adjusted R-squared:  0.321 
#> F-statistic:  216 on 47 and 21363 DF,  p-value: <2e-16

5 Supervised learning

  • Data splitting (if a training/test set split is enough for the global analysis, at least one CV or bootstrap must be used)
  • Two or more models
  • Two or more scores
  • Tuning of one or more hyperparameters per model
  • Interpretation of the model(s)

6 Unsupervised learning

  • Clustering and/or dimension reduction

7 Conclusion

  • Brief summary of the project
  • Take home message
  • Limitations
  • Future work?